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We formulate and investigate a statistical inverse problem of a 
random tomographic nature, where a probability density function 
on 'M? is to be recovered from observation of finitely many of its two- 
dimensional projections in random and unobservable directions. Such 
a problem is distinct from the classic problem of tomography where 
both the projections and the unit vectors normal to the projection 
plane are observable. The problem arises in single particle electron 
microscopy, a powerful method that biophysicists employ to learn the 
structure of biological macromolecules. Strictly speaking, the prob- 
lem is unidentifiable and an appropriate reformulation is suggested 
hinging on ideas from Kendall's theory of shape. Within this setup, 
we demonstrate that a consistent solution to the problem may be 
derived, without attempting to estimate the unknown angles, if the 
density is assumed to admit a mixture representation. 

1. Introduction. The classical problem of tomography can be informally 
described as that of the determination of an object by knowledge of its 
projections in multiple directions. Problems of this nature arise in a variety 
of disciplines including medicine, astronomy, optics, geophysics and electron 
microscopy. Mathematically, the problem is formulated as that of seeking 
a solution to an integral equation relating a real function / : — > R to its 
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one-dimensional Radon transform (or X-ray transform), 

/ + 00 
fix + rOdr, eeS""' andxee^. 
-oo 

Under regularity conditions on /, the Radon transform can be seen to be 
invertible, and the function / can be recovered by means of explicit formulas 
which we omit (Helgason [22] and Jensen [26]). 

In practical situations, such as X-ray imaging (e.g., Shepp and Kruskal 
[47]), one seeks to determine / given finitely many pairs of orientations 
and corresponding profiles {{Ci, f{Cir))}i^=i- Several algorithms have been 
proposed to address this problem, and these are often problem specific, al- 
though one may single out broad classes, such as Fourier methods (based 
on the projection-slice theorem) and back-projection methods (see Natterer 
[38]). The subject matter and mathematical literature on such problems and 
their solution approaches is vast (see Deans [10] for a succinct overview). 

In statistics, the tomographic reconstruction problem manifests itself most 
prominently in the case positron emission tomography (PET), which can 
be seen as a special type of density estimation problem where a density / is 
to be estimated given a random sample {(Hj,Xj)}"^^ from a density propor- 
tional to /(^,x) (see Shepp and Vardi [48] and Vardi, Shepp and Kaufman 
[53]). PET lends itself to statistical treatment through a broad range of 
techniques such as likelihood-based, orthogonal series (singular value de- 
composition) and smoothed backprojection techniques, to name only a few 
(e.g., Vardi, Shepp and Kaufman [53], Silverman et al. [49], Green [20], Jones 
and Silverman [28]). Naturally, theoretical aspects such as consistency and 
optimality have also been widely investigated (e.g., Chang and Hsiung [8], 
Johnstone and Silverman [27] and O'Sullivan [39]). Further to PET, statisti- 
cal problems such as random coefficient regression and multivariate density 
estimation have also been treated by means of insights and techniques gained 
from the field of tomography (Beran, Feuerverger and Hall [2] , Feuerverger 
and Vardi [14] and O'Sullivan and Pawitan [40]). 

In this paper, we formulate and investigate a stochastic variant of the 
classical tomographic reconstruction problem, where the profile orientations 
are not only random, but are in fact unobservable. This variant arises in the 
electron microscopy of single biological particles (see Section 1.1), a pow- 
erful method that biophysicists employ in order to study the structure of 
biological macrocolecules. It is qualitatively different from the usual tomog- 
raphy settings, where reconstruction crucially depends on the knowledge of 
the projection directions {(^i}^^^. When the latter are unavailable, it is nat- 
ural to ask whether anything interesting can be statistically said about the 
unknown density. We explore the limitations that are inherent when try- 
ing to answer such a question, and propose a mixture framework where the 
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three-dimensional structure can be consistently estimated up to an orthogo- 
nal transformation, without attempting to estimate the unknown projection 
angles. 

The paper is structured as follows. In Section 1.1 we present an informal 
introduction to the missing angle tomography problem of single particle 
electron microscopy. We then formulate the problem statistically (Section 
2) and discuss its main aspects and the relevance of shape-theoretic ideas 
(Section 3). We then proceed to introduce a parametric framework (Section 
3.1) which allows for a "statistical inversion" in the shape domain (Section 
4). Illustrations are provided in Section 5 and the paper concludes with some 
remarks in Section 6. 

1.1. Single particle electron microscopy. Resolving the structure of a bi- 
ological particle is an undertaking that involves piecing together numerous 
facets of a complex investigation. The most important of these is, perhaps, 
the three-dimensional visualization of the particle whose dimension can be 
of the order of Angstroms (1 A = 10"^*^ m). Although it is X-rays that have 
traditionally been associated with particle structure determination, electron 
microscopy has arisen as a powerful tool with important advantages in these 
endeavors (Chiu [9], Frank [15], Glaeser et al. [19] and Henderson [23]). 

The structure of a biological particle is described by the relative position- 
ing of its atoms in space. Each atom's electrons create a potential around it, 
and the ensemble of these potentials gives rise to a potential distribution in 
three-dimensional space, the shielded Coulomb potential distribution, which 
is typically assumed to admit a potential density function, say p{x, y, z). The 
structure of the particle is then described by this density. 

This potential density provides the means of interaction with the electron 
microscope's beam: when the beam passes through the specimen (particle) 
in the ^-direction, there is a reduction to the beam intensity caused by the 
scattering of electrons due to the interaction with the specimen. According to 
the Abbe image formation theory (Glaeser et al. [19]), the intensity recorded 
on the film under the specimen is approximately linear in the projection 
of the particle density in the z-direction, p{x,y, z) dz. Therefore, the 
imaging mode of the electron microscope provides us with a sample profile 
from the Radon transform of the particle's potential density. 

While as such, the problem should be amenable to the "traditional" to- 
mographic reconstruction techniques, things in practice are not as straight- 
forward due to the problem of radiation damage (Glaeser [17]). Extended 
exposure to the electron beam will cause chemical bonds to break, and thus 
will alter the structure of the specimen. It follows that it is impossible to im- 
age the same particle under many different orientations. The exposure should 
instead be distributed over many identical particles. This can be achieved 
by crystallizing multiple particles (Drenth [12]) but reliance on crystals has 
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Fig. 1. (a) Random profiles of pyruvate-ferredoxin oxidoreductase (PFOR) obtained via 
single-particle electron microscopy at the Lawrence Berkeley National Laboratory ( Cour- 
tesy of F. Garczarek and R. M. Glaeser). (b) Reconstruction of the three-dimensional 
potential density after the projection angles have been estimated (Garczarek et al, [16]). 



several fundamental drawbacks (Frank [15] and Glaeser [18]). Single particle 
cry o- electron microscopy is a technique of electron microscopy, that aims at 
obtaining a three-dimensional representation of the particle without crys- 
tallizing the sample (e.g., Glaeser [18]). In this approach, a large number 
of structurally identical particles are imbedded unconstrained (i.e., uncrys- 
tallized) in an aqueous solution, then rapidly frozen and finally imaged via 
the electron microscope. Since the particles are unconstrained, they move 
and rotate freely within the aqueous solution, assuming haphazard orienta- 
tions at the moment they are frozen. After preliminary processing, the data 
yielded are essentially noisy versions of the projected potential densities, 
at random and unknown orientations. Figures 1 and 2 present character- 
istic examples of such data in the presence of noise (Figure 1), and in the 
ideal — but practically impossible — noiseless case (Figure 2), for two different 
particles. 




120° 



♦ < # 



Fig. 2. (a) A model of the three-dimensional potential density of the human transla- 
tion initiation factor elFS derived from single particle data after angle estimation (Siri- 
dechadilok et al. [50]). (b) Noiseless random projections obtained from the known model 
(Courtesy of R. J. Hall). 
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Biophysicists typically proceed via attempting to estimate the unobserv- 
able orientations, in order to then be able to iteratively use standard tomo- 
graphic techniques (Frank [15] and Glaeser et al. [18]). However, they often 
rely on a priori knowledge on the structure of the particle either from other 
experiments or from an ad hoc examination of the projections by eye, in 
order to perform this estimation. Once an initial model is provided, it may 
be used to estimate the unknown angles and update the estimate. In cases 
where previous structural information is not available, and a naked eye ex- 
amination is either not feasible (e.g., when the particle has no symmetries) 
or would best be avoided, it is natural to wonder whether an "objective" 
initial model can be extracted directly from the data, without attempting 
to estimate the unknown angles. 

2. A stochastic Radon transform. We may distinguish three important 
aspects in the random tomography problem that arises in single particle 
electron microscopy: (I) the samples of the Radon transform are obtained at 
haphazard orientations ^ which are thought as random, (II) the physics of 
the data collection process allows for the possibility of within-plane rotations 
in a projection, (III) one does not observe the projection orientations. 

These aspects become clear once we have a precise working definition and, 
for this reason, we define a random analogue to the Radon transform. Let 
(S0(3),©) be the measurable space of special orthogonal matrices, with © 
the Borel cr-algebra generated by the standard Riemannian metric on S0(3). 
Also, let (L^(A2), «) be the measurable space of square integrable functions 
on the disc A2 := {x € : ||x|| < vr}, equipped with the Borel cj-algebra « 
generated by the L^-norm. 

Let /9:M'^ — > [0,oo) be probability density function centered at zero. Since 
the object of any tomographic probe is necessarily finite, we shall restrict our 
attention to densities that are supported on the ball A3 := {x G M"^ : ||x|| < vr} 
and essentially bounded (i.e., esssupp < 00). 

We define the projection operator of p as the mapping n{/9}:SO(3) — > 
L^(A2) given by 



where Ap{x.) := p{A~^x.) for x G M'^ and A G S0(3). Given an element Aq G 
S0(3), the function n{/9}(Ao) is the projection (or profile) of p at orienta- 
tion Aq. In particular, Il{p} is well defined as a random element of L^(A2) 
if we equip (S0(3),®) with a probability measure. 

Proposition 2.1 (Measurability). Let p-.A^^ [0,oo) be an essentially 
bounded probability density function centered at the origin. The projection 
operator n{p} is a measurable mapping from (50(3),®) to (L^(A2),«). 



(2.1) 




G S0(3) 
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Proof. Let ^: A3 ^ M be a continuous function, and let An - — > A 
be a convergent sequence in S0(3). By continuity, it holds that AnO — > 
A9 pointwise [recall that An6{x) = 9{A~^ii)\. Combining this fact with 
the bounded convergence theorem shows that lim„^oo(n{0}(^„,))(2;, y) = 
(n{6l}(A))(x,y), for all (x,y) G M^^ ^j^g bounded convergence theorem then 
implies me]{An) - n{0}(A)||2 ^ 0. 

Now let p be as in the assumptions of the theorem and let e > 0. By 
Lusin's theorem, there exists a continuous function 9^ : A3 — > M such that 
Leb{x G A3 : p(x) 7^ ^£(x)} < e. By the triangle inequality, 

iin(p)(^„) - u{p){A)h < m{p){An) - m){An)\\2 

+ ||n(^,)(A„)-n(e,)(^)||2 
+ me,){A)-uip){A)h. 

If we let n — > 00, our earlier analysis shows that the second term will vanish. 
The first and third term on the right-hand side are bounded above by e- C, 
for some finite C > since p is nonnegative and essentially bounded, while 9^ 
is bounded. Since the choice of e is arbitrary, this establishes the continuity 
of njp} with respect to the relevant topologies and its measurability with 
respect to the corresponding Borel <T-algebras. □ 

For > 1, let {An}n=i be independent and identically distributed ran- 
dom elements of the special orthogonal group S0(3) distributed according 
to normalized Haar measure. We define the stochastic Radon transform of 
length N oi p as the i.i.d. collection of random projections {Il{p}{An)}n=i, 
taking values in the sample space (L^(A2),«). These realizations of inde- 
pendent projections are not coupled with the corresponding orientations, 
that is, we observe Il{p}{An) but not An. For this reason, we suppress the 
dependence on An whenever this does not cause confusion, and write pn for 
Il{p}{An). From the classical statistical perspective, we observe that any 
centered essentially bounded density p on A3 induces a probability measure 
Pp on the measurable space (L'^(A2),«) via 

Pp[B] = ^{AeS0{3):U{p}{A)eB}, Be'B, 

with ^ denoting normalized Haar measure on (S0(3),©). The stochastic 
Radon transform of length of p is then an i.i.d. random sample from the 
distribution Pp (a collection of independent random fields with law Pp). 

3. Invariance and shape. We wish to consider the recovery of a density 
given its stochastic Radon transform, that is, to investigate the feasibility of 
a statistical inversion of the stochastic transform. When seen as an estima- 
tion problem, the recovery problem exhibits certain special group invariance 
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properties; these are manifested both in the parameter space as well as in 
the sample space, as unidentifiability and sufficiency, respectively. 

Focusing first on the parameter space, we recall that a parametric family 
of models (distributions) {Pg} with parameter space is identifiable if the 
mapping Pg is a bijection. The probability model parameterized in our 
case is the distribution of a random profile n{^} with 9 £3', where 3' is the 
set of essentially bounded probability densities supported on A3 that are 
centered at the origin [understood as a subset of the metric space ^^(Aa)]. 
However, the following parametrization is not well defined, in the sense that 
it is unidentifiable: if S e 0(3) so that B'^B = I and det{B) £ {-1, 1}, we 
may put Q = diag{l, l,det(i?)}, and observe that 

/+00 f+00 , 

AB9{x,y,z)dz= / ABQ9{x,y,det{B) ■ z) dz = U{e} 
-00 J —00 

for any A ~ Haar[S0(3)], by right invariance of Haar measure. 

It follows that the probability law Pp induced on L^(A2) by the parameter 
p£ 3' is the same as the law P^p for any B G 0(3). This suggests that ideally 
we could only recover the original function modulo 0(3), which leads to the 
need for a parametrization of the model in terms of those characteristics 
of the functions of 9" that are invariant under orthogonal transformations. 
Formally, let G{9') = {-ja '■ A £ 0(3)} be the group of rotations and reflections 
on the function class J', with action 

(3.1) (7A/)(n) = f{A"'u), A G 0(3), / G J. 

Define the shape [f] of a function / G 9" as its orbit under the action of G{S') 

(3.2) [/]={7(/):7GG(J)}. 

Consequently, we call the quotient space 3'/G{3') the shape space of 9", and 
we denote it by T,3'. While we saw that we cannot recover "more than [/>]" 
from the stochastic Radon transform of p, we prove next that shape can be 
potentially recovered given a sample from the stochastic Radon transform. 

Theorem 3.1 (Singular identifiability) . Let 3^ be the set of probability 
densities supported on A3 that are centered at the origin and are essentially 
bounded. For 9 £ 3', let P[g^ denote the probability distribution induced on 
the sample space (L^(A2),!b) by [9] £ via the stochastic Radon transform 
n{/i} of any h £ [9]. Then, for any two distinct elements [/], [g] £ T,3', the 
measures Pjj] and P^gj are mutually singular. 

To prove this theorem, we will make use of the following result on Radon 
transforms (see Proposition 7.8 in Helgason [22]). 
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Proposition 3.1. Let /iM*^— >M he function of compact support and 
H he an infinite subset of the unit sphere S^. Then f is determined hy the 
collection {/^jggS; where 

r+oo 



J —CO 



Proof of Theorem 3.1. Assume that [/], [g] e EJ" and [/] ^ [g]. Since 
= Pg and P[g] = Pg, it suffices to show that Pj J- Pg. Since the shapes 
[/] and [g] are distinct, we have 

f^Bg Vi?GS0(3). 

It follows that given any T £ 50(3), the set {A G 50(3) : n{/}(^) = U{Tg}{A)} 
has Haar measure zero, 

(3.3) ^{AeS0{3):U{f}{A)=U{Tg}{A)} =0 VP G 50(3). 

For if this were not the case we could find an uncountably infinite set H C 
such that = (P^)^ for all ^ G H, where 

r'+OO 



h^{x)= h{x + Ti)dT, xG^ 

J ~oo 



So, by means of Proposition 3.1, we would conclude / = Tg, contradicting 
our assumption. 

Consider an arbitrary coupling of Pf and Pg, that is, let (50(3),©,'!') 
be as before, and let h : (50(3), ©) (50(3), ® ) be any measurable function 
such that ^{A G 50(3) : n{5}[/i(^)] G •} = Pg[']. 

Initially, we assume that h{-) is continuous. Since 50(3) acts transitively 
on itself, we may represent h as 

h{A)=ATA, Pa e 50(3). 

By continuity of h, it follows that A h-> P^ is also continuous. 

Now, let {A„}„>i be a monotone sequence of partitions of 50(3) that 
become finer as n increases. That is, A„ partitions 50(3) into n disjoint 
sets {AJjjf^]^ with the property that for every j G {1, . . . , n + 1} there ex- 
ists an ij G {1, . . . ,n} such that A^^^i C An ■ Define a sequence of "simple" 
measurable functions 



hn{A)=Ar2 on the set A 



where PJ^ is defined as 



Ti := argmi^ | min \\U{f}{A) - U{g}{AT)\\2 
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for yi„ the closure of A^^. The above is well defined by compactness of A^^ 
and continuity of Ai— > Ta- Now, since h is continuous, we have 

hn^h VAgS0(3). 

Hence, by continuity of the projection mapping and by the dominated con- 
vergence theorem, we have, for all A £ S0(3), 

(3.4) ||n{/}(^) - U{g}{K{Am2 T ||n{/}(^) - U{g}{h{A))h. 

That the convergence is monotone follows from the definition of /i„ : the par- 
tition sequence is monotone and for A E Q A^^ we have that {Fyi : A G 
An^i} C {r^ : A £ A^n}. We now proceed to define the sets 

Xn := {A e S0(3) : ||n{/}(A) - U{g}{K{A))\\2 > 0}, 
X:={A€ S0(3) : \\n{f}{A) - U{g}{h{A))h > 0}. 
By definition of 

Xn C Xn+i Vn > 1. 
Therefore, 3C„ t U^=i 3C„, where 

oo oo 

[jXn=[j{AG S0(3) : \\U{f}iA) - n{g}(/i„(A))||2 > 0} 

n=l n=l 

= {Ag SO(3)|3n > 1 : ||n{/}(A) - nM(/i„(A))||2 > 0} 
= {Ae S0(3) : \mf}{A) - U{g}{h{A))h > 0} 
= X. 

The penultimate equality follows from the monotone convergence given in 
(3.4). Continuity of ^ from below leads us to the conclusion 



^[X] = 'i' 



lim ^[3C„] 



[J Xn 

.n=l 

On the other hand, by definition of hn, we have that, for all n > 1, 
nXn] = ^{A G S0(3) : ||n{/}(A) - U{g}iK{A))h > 0} 

^ [J{Ag S0(3) : mniA) - n{g}iKiA))h >0}nAl, 



.1=1 

n 



|+J{^ G yi^ : ||n{/}(^) - u{g}iArjh > 0} 

i=l 

Y: ^{A e K ■■ \mf}{A) - Ii{Tlg}{A)h > 0} 



i=l 
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n 
i=l 

by appealing to the first part of our proof (3.3). In summary, we have shown 
that 

= e S0(3) : Ii{f}{A) / Ii{g}{h{A))} = 1. 

Now consider the case where h is measurable, but not continuous. We 
recall that Lusin's theorem guarantees that for any 5 > there exists a 
closed set "Ks (and hence compact in our case) and a continuous function hs 
such that h = hs on 'Ks while ^'(50(3) \ "Ks) < 6. Therefore, for arbitrary 
measurable h, and given any 6 > 0, 

= G : n{/}(A) / U{g}ih{A))} 

+ ^A G S0(3) \ Jis : n{/}(A) / U{g}{hiA))} 

= ^A G : n{/}(A) / U{g}{hs{A))} 

+ MA G S0(3) \ : U{f}{A) / n{<7}(/i(^))} 

> ^{A G %s : n{/}(A) / n{5}(/i5(^))} 

= ^'[:K5] 

> l-(5. 

The choice of 5 being arbitrary, we conclude that the event {!!{/} ^ Il{g}} 
has probability 1 for an arbitrary coupling. Strassen's theorem now implies 
that the total variation distance between Pj and is 1, which completes 
the proof. □ 

It follows that, under isotropic projection orientations, the unknown den- 
sity is identifiable up to an orthogonal transformation, regardless of whether 
or not we observe the projection angles [in fact, this remains true if Haar 
measure ^ is replaced by any measure ^' on S0(3) with <C ^']. 

Shape is not just crucial as a notion in the context of the parameter space 
only. Under isotropic projection orientations, any orthogonal transforma- 
tion of the two-dimensional projection data contains the same information 
on the function- valued parameter. Letting G(L^(A2)) denote the group of 
rotations and reflections on L^(A2), we define the shape of an element / 
in the sample space L^(A2) as [/] = {«(/) - a G G(L^(A2))}. We equip the 
corresponding shape space (quotient space) M := L^{A2)/G{L^{A2)) with 
the Borel tr-algebra 5W generated by the quotient topology. This turns the 
quotient space into a measurable space, and the quotient mapping into a 



RANDOM TOMOGRAPHY 



11 



measurable mapping, that is, a statistic. The next proposition estabhshes 
that the shape statistic is sufficient with respect to the original shape (see 
page 85 of Schervish [46] for the definition of abstract sufficiency). 

Theorem 3.2. Let 'J be the set of probability densities supported on A3 
that are centered at the origin and are essentially bounded. For 9 £3', let Pjg] 
denote the probability distribution induced on the sample space (L^(]R^),«) 
by [9] G 513" via the stochastic Radon transform n(/i) of any h £ [9]. The 
mapping n(/i) 1— > [n(/i)] is a sufficient statistic for the parameter [9] and a 
maximal invariant statistic with respect to the group G{L'^ {A2)). 

Before we prove Theorem 3.2, we prove a lemma and recall two results 
from measure theory. 

Lemma 3.1. Let [9] G and P[q] be the law ofIl{h}, induced by h G [9]. 
Then, given any B £ CB , 7 G G{L'^ {A2)), it holds that P[0]{B} = P[0]{'yB}. 

Proof. There exists aW £ 0(2) such that for A ~ Haar[S0(3)] 

7[nW(A)(x,y)]^|_^"(^ ^J^^^'^ A9ix,y,z)dzlu{9}iA){x,y), 

the last equality following from the left invariance of Haar measure. □ 

The next two results can be found in Lemma 1.6 and Theorem 2.29 of 
Kallenberg [29]. 

Lemma 3.2. Let {M,d) be a metric space with topology T and Borel a- 
algebra si. Then, for any D C M, the metric space {D,d) has topology 7nD 
and Borel a -algebra (1 D. 

Proposition 3.2. Let G be a locally compact second countable Haus- 
dorff group that acts transitively and properly on a locally compact second 
countable Hausdorff space S. Then, there exists, uniquely up to renormal- 
ization, a G-invariant Radon measure fJ,^0 on S. 

Proof of Theorem 3.2. Maximal invariance follows immediately from 
the definition of shape as the orbit under the group of orthogonal transfor- 
mations. To prove sufficiency, we note that the space (L^(A2),«) is a stan- 
dard Borel space since it is complete and separable in the metric induced by 
the L^-norm. It follows that there exists a regular conditional distribution 
u(B\[9],m):-B X x M ^ [0, 1] for U{9) given [U{9)], 

(3.5) u{B\[9],m):=P[g]{B\[U{9)] = m}, B£'S,meM. 
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Therefore, sufficiency will follow if we can show that i^(-B|[^],m) is function- 
ally independent of [6], that is, = r{B,m),\/w G 519". We begin by 
observing that i^{m\w,m) = 1. Therefore, i'{-\w,m) can be viewed as a prob- 
ability measure on (m, « Hm), where « Plm := {mDA ■.AG'b}. Now let T be 
the natural topology of L^(A2), so that « n m = (t(T) n m = (7(Tn m) is the 
Borel (T-algebra of subsets of m, generated by the subspace topology Tn m 
(Lemma 3.2). But (m,Tnm) is a locally compact second countable Haus- 
dorff space. Hence, by Proposition 3.2, there exists a unique Radon mea- 
sure (up to constant multiples) r{B,m) on (m, « n m) such that r{B,m) = 
r{'yB,m), for all 7 S G(L^(A2)) and i? E « Plm. But Lemma 3.1 implies that 
u[B\w, m) = i'{'yB\'w, m) for all 7 S G(L^(A2)) and all S G « H m, and is 
a probability measure. Consequently, it must be that i^{B\w,m) = Xr{B,m) 
for some A > 0, which completes the proof. □ 

It follows that our analysis should concentrate on the concept of shape. 
On the one hand, it is the shape of the unknown density that we seek to 
estimate; on the other hand, we should base our estimate on the shape of 
the projections, that being a sufficient statistic. 

A systematic mathematical study of shape in the case of finitely many la- 
beled points in Euclidean space was initiated by Kendall [31]; his motivation 
was the question of existence of alignments in megalithic stone monuments 
(Kendall and Kendall [32]). In Kendall's approach, shape is the collection 
of those characteristics of a labeled point pattern that are invariant under 
rotation, translation and scaling. The shape spaces induced have a mani- 
fold structure, and their geometry depends both on the number of points 
and the dimension of the ambient space (Le and Kendall [34]). A closely 
related concept of shape with a different "representation theory" was in- 
dependently proposed by Bookstein [3], who was interested in biological 
applications and morphometries. In Kendall's terminology, our version of 
shape would be called ^^unoriented shape- and-size,'" to stress the fact that 
0(3) is quotiented out while scalings are not. Kendall and Le [33] provide a 
compendious review of statistical shape theory. 

Though shape spaces of finite point patterns are well understood and 
widely used in applied work, there is apparently no practical formulation 
of the shape of a function. An active field of research focuses on practical 
parameterizations of the shape of closed curves on the plane and in space 
(e.g., Younes [54] and Small and Le [51]), the principle motivation being 
computer vision. Such ideas do not appear useful, though, when attempting 
to find connections between the shape of a function and the shape of its in- 
tegral transform. For this reason, we will hinge on Euclidean shape-theoretic 
ideas that will enable us to establish such connections, via an appropriate 
parametrization (see Panaretos [41], Panaretos [42] and Panaretos [43]). 
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Fig. 3. Synthetic single-particle projection data from the Klenow fragment of Escherichia 
coli DNA Polymerase I (Courtesy of A. Leschziner) . The projections resemble mixtures of 
roughly circular components contaminated by noise. 



3.1. Parametrization: radial expansions and the Gram matrix. At least 
three ingredients come into play when considering a parametrization for 
the shape of a density in this particular context. First, it is important that 
the parametrization allow for the problem to be posed as one of parameter 
estimation. In addition, one may ask for a parametrization that makes it 
feasible to always explicitly be able to pick out a representative member 
from a particular shape class. Most important is the need to be able to 
find a connection between original shape and projected shape. In essence, 
what we are asking for is a parametrization that will allow us to convert the 
abstract setting of quotient spaces into something we can put a handle on. 

With these general considerations in mind, we choose to focus on the 
following parametric yet flexible class of finite mixtures of radial location 
densities: 

K K 

(3.6) p{x) =^qkip{x\fLk), /ifc G M^, Qfc > 0, ^ % = 1, 

k=l k=l 

where <f{-\£,) is a spherically symmetric probability density with expecta- 
tion ^, so that (p{-\£,) = /(||x — ^11) for some probability density /iM"*" — ^M"^. 
The choice of this particular type of expansion appears useful both from 
the applied and the mathematical points of view. The optics of the imaging 
procedure have a smoothing effect on the planar densities recorded on the 
film. As a result, the projected particles often do appear as an ensemble of 
roughly circular "blobs" (see, e.g., Figures 1, 2 and 3). Mixtures of Gaus- 
sians have previously been employed to obtain Riemannian metrics between 
biological shapes in deformation shape analysis (e.g., Peter and Rangarajan 
[44]). 

From a mathematical point of view, this type of density is well behaved 
under orthogonal transformation and projection. For any A G 0(3), 

(3.7) (^(A^xll) = /(P^x - III) = /(||x - AiW) = ^(x|^|), 
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and letting H be the projection onto the plane 2; = 0, 

r+00 



ip{x, y, z\A(,) dz = (j){x, y\n), 
J— 00 

(3.8) 

/+00 
ip{x,y,z\0) dz, 
-00 

so that any rotation of the density can be encoded by a rotation of the 
location parameters /i^, while its two-dimensional profiles can be expressed 
as a mixture of the marginal of y?, regardless of the projection orientation. 

To remove the effects of location, assume that the density is centered 
with respect to its location vectors, that is, assume X^a^i Afc = 0. Since any 
rotation of p can be encoded by a rotation of its location vectors, we may 
use the characteristics of the location vectors to encode the shape of p. 
The Gram matrix generated by the collection {p^} is the K x K symmetric 
nonnegative definite matrix, whose ijth element is the inner product {pi,pj), 
as follows: 



(3.9) Gram({/ifc}) 



/ llwf (W,A2> ••• {pi,pk)\ 

ihifj-l) \\P2\\^ ■■■ {h,f^K. 



\{pK,Pl) ■■■ ll/^i^lP / 

In Kendall's shape theory. Gram matrices are employed as a coordinate 
system for the shape manifold induced by rigid motions. Note that if the 
vectors {pk} are arranged as the columns of a 3 x X matrix V, then we 
may simply write Gram {V) = V^V. The Gr am matrix is invariant under 
orthogonal transformations of the generating vectors, since for i? £ 0(3) we 
immediately see that Gram(SF) = B'^ BV = V^V = Gram(y). Further- 
more, given a Gram matrix of rank p, one can find K vectors in M'^, d>p, 
with centroid zero whose pairwise inner products are given by that Gram 
matrix. In fact, the specification of such an ensemble amounts to merely 
solving nondegenerate lower triangular linear systems of equations. 

We can thus define the shape of a (/j-radial mixture as the coupling of its 
mixing proportions with the Gram matrix generated by its location vectors: 

(3.10) [p] = (Gram({/ife}),te}). 

We call the two components of this parametrization the Gram component 
and the mixing component, respectively. The shape of a profile of p, say 
po(x,y) = J2f=iQk4>{^:y\HAQpk), corresponding to a rotation Aq G S0(3) 
win then be given by [po] = {Gram{{HAopk}),{qk})- 

Our interest now is in establishing a relationship between the shapes of 
the Radon profiles of a density and the shape of the original density. 
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4. Statistical inversion. Since the coefficients involved in tlie radial mix- 
ture expansion are invariant under projection, the Gram matrix of the lo- 
cation vectors becomes the primary object of interest. Especially in view of 
sufficiency, we seek a relationship between the Gram components of the pro- 
jected shape and the original shape. The following theorem provides such a 
connection and can be seen as an inversion in the shape domain. 

Theorem 4.1 (Shape inversion). Let V be a d x k matrix, 2 < d < oo, 
k < oo, whose columns encode an ensemble of k elements of with cen- 
troid zero. Let ^ be normalized Haar measure on SO{d) and H denote the 
projection onto a subspace of dimension d — 1. Then, 



Proof. We may assume that H = diagjl, ...,1,0} without loss of gen- 
erality. We notice that Gram{HAV} = A'^HAV, since H is symmetric 
idempotent and recognize that A^ HA is the spectral decomposition of a 
projection onto the plane {A~^x:x G Im(ff)}, where lm{H) is the image 
of H. As such, we should be able to encode the same projection relying 
solely on a unit sphere parametrization, as opposed to using the special 
orthogonal group. Indeed, B'^HB = I — uu for B ^ Haar[SO((i)] and u 
a uniformly random unit vector, u ~ 'U(S'^~"'^) (/ — uu~^ is the projection 
onto U"*-). Hence, the proof of the theorem reduces to verifying that, for 
u ~ 'U(S"'~"'^), E[titi~''] = d~^L. The uniform distribution on the hypersphere 
is invariant under orthogonal transformations, Wu = u, \/W G 0{d). There- 
fore, K[uu ] = WK[uu ] for all 1^ G 0(d), implying that E \uu \ = cl for 
some constant c G M. Finally, note that trace (E[uu~'']) = trace(E[n'''u]) = 1, 
so that it must be that c = d~^ and the proof is complete. □ 

The relation in (4.1) reminds one of Cauchy's formula and other related 
stereological results, where the key ingredient is the isotropy of the projec- 
tion hyperplanes (see, e.g., Baddeley and Jensen [1]). 

Theorem 4.1 says that the expected projected Gram matrix is propor- 
tional to the original Gram matrix. Thus, supposing that we can estimate 
{qk} consistently by some estimator {g^}, an obvious consistent estimator 
is given by 



(4.1) 




d-l 
d 



Gram{F}. 



(4.2) 




which is essentially a method of moments estimator coupled with {qt}- Un- 
fortunately, things are not so straightforward. Given any profile pnix,y) of 
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p, the expansion pn{x,y) = J2k=i1k4'{x,y\Anflk) is unobservable. Contrary 
to the case of orthogonal expansions in Hilbert space, there is no transform 
corresponding to this expansion, so that the unobservable elements {qk}k=i 
and {Anllk}i<k<K,i<n<N must be estimated from the data. A further subtle 
problem thus arises: since the expansion is unobservable, the correspondence 
of the indices are also unobservable. When looking at a projection, regard- 
less of how we arrange the location vectors to build the Gram matrix and 
coefficient vector, the information encoded is the same. However, we must 
be able to choose this arrangement consistently across all projections, since 
we will be averaging the Gram matrices across projections. If the indices are 
unobservable, guaranteeing this consistent construction of the Gram matri- 
ces is nontrivial. To surpass this further unidentifiability issue, we impose 
an assumption on the mixing components. 

Assumption 4.1. The components of the density are distinguishable, 
that is, in the setup given in (3.6), we further assume that qi / qj, 

Assumption 4.1 allows us to use the auxiliary parameters (the mixing 
and perhaps the scaling coefficients, if these are included) estimated from 
the data to recover the unobservable labeling. A hybrid maximum likeli- 
hood/method of moments (MoM) estimator is presented in the next section. 

4.1. A hybrid estimator. In this section, we propose an estimator of the 
shape of the unknown density when this can be expanded as a finite mix- 
ture satisfying Assumption 4.1. The estimator is a hybrid estimator, fusing 
together a maximum likelihood estimator of the unobservable profile expan- 
sions with a method of moments estimator of the final Gram matrix in light 
of Theorem 4.1. For simplicity and tidiness, we will treat the planar case. 
The treatment of the d-dimensional case, d>3, is directly analogous. 

In the planar case, we have p:M? ^ [0, oo), an essentially bounded density 
function supported on the disc of radius vr, A2 = {x G : ||x|| < vr}. We 
let be a positive integer and {An}n=i be independent and identically 
distributed random elements of the special orthogonal group S0(2) drawn 
according to the corresponding normalized Haar measure. Finally, we write 
^p(x) := /9(^~^x) for x G and A G S0(2). The corresponding stochastic 
Radon transform is the collection of projections 

/+00 
Anp{x,y)dy, 2;G[-7r,7r]. 
-00 

Let (^(-IC) be a planar radial density function centered at ^, and let 0(2;|O) = 
ip{x,y\0) dy be a symmetric one-dimensional location density, centered 
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at the origin. Our model is 

K 

(4.4) p{x,y) = ^qkip{x,y\jXk), qii^ qj, '^ii^ j, 

k=i 

so that the nth profile is pn{x) =J2^^=lQk4'{x\f^^l^^). Here, /u^"^ G [— ''r,7r] de- 
notes the projection of the kth location vector in the nth profile of the 
stochastic Radon transform: = HAnflk, H = (1,0). Since we assume 
that p is supported on the disc A2, it must be that diam{supp((^)} < 27r. 

In practice, we observe a discrete version of the profiles, on certain lattice 
points {xt]J=i E [— 7r,7r], for T a positive integer. In particular, we assume 
the lattice to be regular, that is, the xt being equally spaced. Furthermore, 
the digital images {In}n=i of profiles will be contaminated by additive 
noise, which is assumed to be Gaussian and white, 

K 

(4.5) In{xt) = Pn{xt) + e„(t) = QkHxtlp'C^) + Snit) 

k=l 

for 1 < n < and 1 <t <T. Here, {Snit)} is a collection of independent 
Gaussian white noise processes on {1, . . . ,T} with variance cr^. By indepen- 
dence, both between and within the white noise processes and the random 
rotations in (4.3), we may write down the following log-likelihood expression 
for the parameters of the unobservable mixture expansion: 



2Tr ^ ^ 

(4.6) %,9)oc-— EE 

n=lt=l 



K 

In{xt) -Yqk(l){xt\pf^] 



k=l 

Maximization of this log-likelihood requires that we choose NK location pa- 
rameters (K for each profile), as well as a unique set of K mixing proportions 
to be shared across the profiles, so as to minimize the residual sum of squares 
between the observed and stipulated profiles (note that K is assumed to be 
known). Our hybrid estimator for the shape of the two-dimensional density 
p is then formally written as 

_ / 2 ^ \ 
(4-7) [p]=hr7E'^^3'^(i/^t"^}f=i)'i^fc}f=i , {fl,q)=avgmaxi{p,q). 



G 



The hybrid estimator is consistent and asymptotically Gaussian. Here, asymp- 
totic refers to both important aspects of the problem: the resolution T and 
the number of profiles A'^. The T — > 00 asymptotics relate to the decon- 
volution procedure, while the N ^ 00 asymptotics relate to the inversion 
stage. Depending on how one defines the hybrid estimator, there may be 
an interesting relationship between the two. To state these results, we point 
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out that the underlying probabihty space is a product space {Q,g,¥) = 
X Q2,<7(®i X ©2),lPi X 1P2)- The triple (Q2,®2,IP2) is the space of se- 
quences of special orthogonal matrices with P2 such that all finite- 
dimensional measures are product normalized Haar measures. On the other 
hand, the space (Oi, gi,Fi) induces infinite sequences of row-wise indepen- 
dent white noise triangular arrays {e„(t,T); 1 < t < T < oo}^^i under Pi, 
with common variance a^. 

First, we note that the ML estimators are consistent for every > 1 as 
T ^ 00, implying consistency of the mixing component estimator. 

Theorem 4.2 (MLE deconvolution consistency). Let 6{u!2\N) denote 
the true parameters for the profiles of a stochastic Radon transform of length 
N of the mixture expansion (4-4)! 

e{uJ2\N) = {{lj!'k\uJ2)]i<k<K,l<n<N^{<lk}l<k<K) 

and let 0{ijOi,lo2\T,N) denote the corresponding maximum likelihood decon- 
volution estimate based on the observed profile {lN{xt)}J=i- Then, 

(4.8) ^>i,cj2|T,iV) ^ e{u2\N). 

T — *oo 

To prove Theorem 4.2, we first state without proof a variant of a uniform 
weak law for triangular arrays due to Jennrich [25]. 

Lemma 4.1. Let {Xt^T}t<T be a triangular array of random variables 
with mean zero and variance (t\ < 00, such that elements of the same row 
are independent. Let {gt^xiG)} be a triangular array of continuous functions 
on a compact Euclidean set G satisfying 

1 ^ 

■^gt,T{di)gt,T{02) <oo. 



(4.9) lim sup 



T 

t=i 



Then, 



sup 

6*66 



1 ^ 



T 

t=i 



0. 

T^oo 



Proof of Theorem 4.2. It suffices to show that for P2-almost all uj2 
(4.10) e{u;i,u;2\T,N) ^ e{i02\N) 

T — *oo 

for then the bounded convergence theorem will imply that Ve > 0, 
lim F[\\e{u;i,u;2\T,N)-e{i02\N)\\>e] 

lim ¥i[\\9{uJi,uJ2\T,N) - e{iJ2\N)\\ > e]dF2 = 0. 

^2 
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To this aim, let Q be the support of 9, which is by definition compact, and 
define Qt{-) as follows: 

K 2 
Inixt) - qk4'{xt - Ati"^; 



n=lt=l 



k=l 



27r ^ ^ 

n=lt=l 

If we establish the existence of a deterministic function Q{9) such that: 

(1) supeee \Qt(.0) - Q{0)\ ^ 0, as T ^ cx), 

(2) sup,^ ||,_e-||>,Q(e) < Q{e), Ve > 0, 

(3) QTieiT,N))>QTie)-op,il), 

for P2-almost all uJ2, then relation (4.10) will immediately follow (e.g., van 
der Vaart [52]). We fix and suppress uj2 so that in what follows, random 
variables are to be seen as functions of loi only. Let 

(4.11) Q^0):= \\Mx)-^n{x\e)fdx-al 

where pn{-) = ^n{'\Q) is the nth profile (without noise contamination) and 
a1 is the variance of the noise component. For tidiness write 5n{x\9) := 
\pn{x) - <^n{x\0)\ so that Q{e) = -N~^J2nI Sl{x\9)dx - al- Furthermore, 
let 5l{xt\e) ■.= s\XY>^^_2Tv/T<y<xt^n{x\0) and 

n=lt=l 
^^-^ n=lt=l^t-2n/T<y<xt 

the second equality following form monotonicity of y on R^" . To verify 
condition (1), we must show convergence of supgg0 \Qt{0) — Q{0)\ to zero in 
probability. Using the triangle inequality on the uniform norm, one obtains 

(4.12) sup igT(^) - Q{e)\ < sup \QT{e) - Q*T{e)\ + sup |q?^(0) - q{6)\ . 

6*60 eee eee 

V ' " V ' 

A{T) B{T) 

For the term B{T), we note that /t(^) = IQri^) ~ Q(^)l is continuous on 
the compact set 0. Furthermore, by the definition of the upper Riemann- 
Stieltjes integral, we have that /t i pointwise as T ^ oo. It follows by 
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Dini's theorem that fx converges uniformly to zero, that is, B(T) = supgg0 ' 
Q{e)\ ^^0. Consider now A{T): 



2tt 



N T 



A{T) = ^ E T.[iln{xt) - <^n{xt\e)f - 5l\xt\e) - al] 



NT 
27r 



n=l t=l 

N T 



^j, E E[(/5n(xt) + en{xt) - ^n{xt\e)Y - 5l\xt\e) - 

71=1 t=\ 
N T 

E T.i^li^t\0) - Sf{xt\9) + {elixt) - al) 

+ 2en{xt){pn{xt) - ^ 



n=l t=l 



The triangle inequality yields 



sup I A(T)| < sup 
e»ee eee 



27r 



N T 



^EE(4(-.)--e^) 



sup 

6»ee 



n=l t=l 
N T 



27r 
iVT 



+ sup 



n=l t=l 
N T 



^ E E2e^(^*)(^«(^*) - ^n{xt\0)) . 
n=lt=l 

The first term on the right-hand side is functionally independent of 9 and 
converges to zero in Pi-probability as T ^ oo by the weak law of large 
numbers for triangular arrays. 

To see that the second term converges to zero, we notice that the function 
\-^J2n=i'l2t=ii^n{xt\9) — Sn"^ {xt\9))\ is coutinuous on the compact set 0, 
and, by definition of the upper Riemann-Stieltjes integral, 

N T 



t as T — > oo, 



TT^f EE('^n(xt|e)-<5f(xt|0)) 
n=lt=l 

pointwise in 9. It therefore follows from Dini's theorem that convergence to 
zero is uniform. 

Hence, it remains to show that the penultimate term converges to zero in 
Pi-probability. To this aim, we will use Lemma 4.1. To see that it applies 
here, we need to verify condition (4.9) for the array {ipt,T{9)} = {{p{xt) — 
^n{xt\9)}- Observe that, by definition of the upper Riemann-Stieltjes inte- 
gral, 



27r 



t=l 



< 



2-K 



El^t,r(^l)IIV't,T(02)| 



t=l 



RANDOM TOMOGRAPHY 



21 



2^^ 



<Tf^l^K{xt\0l)5l{xt\e2)i{5n{x\ei),5n{x\e2))2 

t=l 

< OO 

for all 01,02 G ©• Notice that the limit is a bounded function on 0x0. 
Dini's theorem implies that the convergence to this limit occurs uniformly. 
We have therefore verified that (1) holds. 

We now proceed to verify condition (2). When the location parame- 
ters {/i^"^}^]^ within a profile are distinct, the mapping {{qk, lJ'k^^)}^=i ^ 

J2f=iQk X is an injection. Since the uj2 for which the projected 

means are distinct is a set of P2-probability 1, condition (2) follows imme- 
diately from the fact that Q(-) [defined in (4.11)] is a norm on L^[— 7r,7r]. 
Condition (3) is trivially satisfied, by definition of 9t as the argmax of 
Qt{')- Finally, statements in the proof hold for all 002^^2^ and the proof is 
complete. □ 

Using Theorem 4.2, we establish the consistency of the estimator of the 
Gram component. 

Theorem 4.3 (Gram component consistency). Let p be as in (4-4)! 
and let G{T,N) denote the hybrid estimator of the Gram component of [p], 
Gram([p]), based on N independent profiles {In{xt)}J=i, n = 1, . . . , A''. Then, 
G is LP -consistent for Gram([/3]) in the sense that for every p > 0, there 
exists a sequence Tjy t co, such that 

(4.13) nG{TM,N) - Gram([p])||P 0, 
where \\ ■ \\f is the Frobenius matrix norm. 

Before proceeding with the proof, we give two lemmas without proof. 

Lemma 4.2. Let a(r) = (ai(T), . . . , a/^(T)) be a sequence of random 
measures on {1, . . . ,K} and ttt a sequence of random permutations, de- 
fined by theproperty that TrxiaiiT), . . . ,aK(T)) = (Q(i)(r), . . . ,a(/^)(r)) a.s. 
If ce = (ai, . . . , ax) is a measure with distinct components, then 

p p 

(4.14) qt — > a =^ ttt — > vr, 

T^oo T^oo 

where vr is defined by the property 7r(ai, . . . , ax) = (o(i), • • • > ^(if))- 

Lemma 4.3. IfWx is a sequence of random px p permutation matrices 

converging to a permutation matrix W in probability, and Xj- is a sequence 

P 

of random pxl vectors converging to X in probability , then WtXj' — >t_>oo 
WX. 
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Proof of Theorem 4.3. Let [i^^\n,T) be the ML estimator of the 
fcth location parameter within the nth projection. Let Sn(T,N) be the cor- 
responding estimate of the Gram matrix for the locations within the nth 
projection, Sn{T,N) = {{fiC\N,T),fif\N,T))]f^^^^, and Sn be the true 
Gram matrix corresponding to the true means in the nth projection. Then, 
we have 

(4.15) G{T,N) = -Y.Sn{T,N) and G(iV) = - ^ S„ 

n=l n=l 

and we can bound the LP distance (E||G(T, A^) — Gram([p])||^)^/P above by 

(E||G(^,iV)-G(iV)|r^)l/^ + (E||G(iV)-Gram(M)|r^)l/^ 
^ ^ ^ ^ ^ ^ 

A{T,N) B{N) 

It is straightforward that limTv^oo B{N) = 0, so we concentrate on A{T,N): 
9 ^ 

A{T,N) < - ^(E||5„(T,iV) - Snlf^f^ = 2(E2[Ei||5i(r, iV) - 5i||^])^/^ 

n=l 

Here we have used exchangeability and Fubini's theorem. By Theorem 4.2, 
we have that the maximum likelihood deconvolution estimates are P-weakly 
consistent (in the sense of convergence in probability) so that, for every 

iV G N, we have (5i(T, N), . . . , Sn{T, N)) ^t^oc {Si,...,Sn)- This is true 
by a combination of the continuous mapping theorem for convergence in 
probability, Lemmas 4.2 and 4.3. It follows by the fact that {S{T, N)}Ten 
is uniformly bounded so that E||S'i(r, N) - Sifp 0, for all iV G N (e.g., 
see Corollary 2.2.2, page 38 of Lukacs [36]). Now consider a nonnegative 
sequence converging monotonically to zero. Since liniT^oo ^iT, N) = 
for any value of N, then we can find a sequence {T^} such that A{T]sf,N) < 
bN, \/N. Taking the limit as iV ^ oo completes the proof. □ 

Taking the consistency results in Theorems 4.2 and 4.3 as a starting point, 
we may also prove weak convergence results for the mixing and Gram com- 
ponents. 

Theorem 4.4 (Mixing component CLT). Let qN^T{<-^i,^2) denote the 
maximum likelihood estimator based on N independent profiles {Ln{xt)}J^i, 
n = 1, . . . , N , of the mixing proportions q of the mixture given in (4-4 )■ Let 

Nt ] oo be a sequence dominated by T, such that qi^^^T{^i-,^2) — > (1 for T — > 
oo. Then, if ip is twice differentiable, it holds that 

r^oo exp{-yTFy/2} 



(4.16) F[y/WT{qNr,T{^i,^2)-q) G (y,y + ^^y)] ^ (2^)i^/2|j7| 1/2 dy, 
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where the entries of the matrix F are given by 

(4.17) Fij = ^^^[£ Hx\f^t)Hx\H)dx'^ 

with (p{x\fij) := ip{x, y\Afij)dy for A ~ Haar[S0(3)], j = 1,. . . ,K . 

Proof. As with the proof of Theorem 4.2, it will suffice to show that 
for P2-almost all uj2, relation (4.16) holds with Pi replacing P. We start 
out with a technical note, whose relevance will become clear later. Fix two 
indices i,j < K and consider the collection {fl^4>{y\^J'^l^'')(t>{y\^J^Y^) dy}n>i- 
This comprises an i.i.d. sequence in L}{U.2, g2,'^2) so that by the strong law 
of large numbers, the set Bi^j defined as 

{1 ^ l-TT 



has P2-probability 1. Hence, PlPli j — 1- Fo'^ the rest of the proof, we fix 
an arbitrary L02 G Hi j- j' which will not be explicitly written out. Theorem 
4.2, allows the following Taylor expansion of the gradient of the log-likelihood 
around the maximum likelihood estimator qx for large T: 

^KQNr,T) = Ve{q) + [V^£{q)]{qNr,T - q) + 0^{\\qNr,T " qf), 



SO that 



1 . 1 



Vi{q) + Tr^[y^m]VN^{qNr,T - q) 



+ ^OH\\qNr,T-q\\') = 0. 

In order to proceed, we calculate Vi and as follows: 
d 



dqi 



N T I K \ 

-2 EE "^(^i 1^1"'') U«(^t) - E 9fe'?^(^il/^fc"^) ' 
n=\t=\ \ k=\ ) 



N T 



dqidqj n=it=i 

Now, it can be seen that V^^ does not depend on G f^i. Fix two indices 
i,j^K and observe that by the triangle inequality, 

Y.Y.^(^t\fi't'^)^{xt\fif)--E / <P{y\fi,)<P{y\fij)dy 

n=lt=l •^-'^ 
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< 



N 



+ 



N 

E 

n=l 



T 



in). 



i=l 



1 

2^ 



1 1 

2^iV 



AT 

E 

n=l 



a(n,T) 



27r 



-E 



4>{y\f^i)4'{y\fi'j) dy 



f3(N) 

Choose any e > 0. An argument involving Dini's theorem, such as the one in 
the proof of Theorem 4.2, shows that the first term on the right-hand side 
converges to zero uniformly in /x^"^ and Mj"^- Hence, convergence to zero is 
uniform over n, too. As a result, we can choose a Tq such that a{n,T) < e/2 
for any T >Tq and for all n G N. Since uj2 & B, we can also choose an A^o 
such that for any N > Nq it holds that P{N) < e/2. Consequently, for any 
e > we can choose an M = Tq A Nq such that 



1 



1 



(t>{y\f^i)4>{y\H)dy 



< e 



AT'' 27rc72 
for T, A > M. Thus, we have established that, P almost surely 
1 



(4.18) 



AT 



{(V 



2^^ .T.Af-^ooJ 1 
ijj " 



I 2^a? 



E 



(l){x\ni)(l){x\nj)dx ^ = {-Fij} 



with F being the Fisher information matrix. This remains true when re- 
placing A by an increasing sequence A^-, and take the limit as T | oo. We 
now turn to show that the gradient of the log-likelihood satisfies a central 
limit theorem. Let At t oo be as in the statement of the theorem. Define a 
triangular array of random if-vectors {Ir.n} with 1 < n < A^ < T 



(4.19) 



Yt. 



1 ^ 

— ^ (5(xt,T, n)(t){xt,T, n), 



where {xt^r} is a regular lattice of T points on [— 7r,7r], and the vectors 
and scalar s 5 are defined, respectively, as 



(j){xt,T,n) :-- 



6{xt,T,n) :-- 



{In{xt,T) - Ek=l <lk(l){xt,T\f^'k^)) 
27rcr2 



We note that Eil^ = throughout the array. Furthermore, by independence, 

C0Vi[yT,n]=]El[lT,„>T!j 
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(4.20) 

Since we have fixed uj2 G Pit j -^ij) taking the hmit as T | oo yields 



Nt 



(4.21) 



T 1 T^cp 
T,n\ ~ 



n=l 



27rf72 



where the integral is understood element- wise. We now show that our tri- 
angular array satisfies a Lindeberg condition. We have 

Nt 



Y,E^[\\YT,nf;\\YT,n\\>e] 



n=l 



Nt 



n=l 



1 ^ 



T 



n)cl 



T 



VTN: 



T 



t=i 



> e 



where 1 G M'^ is a vector of I's and c > is an upper bound for all the 
elements of (p. These are differentiable and compactly supported functions 
of a; G [— 7r,7r] for every n. The 5^s are i.i.d. Gaussian random variables over 

n and T, so, by defining i.i.d. random variables Xn = X 9^(0, a'^c), the 
last line may be re- written as 

Nt 



1 



n=l 



l^ni|| ; 



1 



/Nt 



Xnl 



> e 



the last term converging to by virtue of the dominated convergence the- 
orem and X having finite variance. It follows from the multidimensional 
Lindeberg-Feller theorem (see, e.g.. Proposition 2.27 on page 20 of [52]) 
that 



(4.22) 



1 



Nt 



T 



n=l 



1 



27rf72 



Returning to the Taylor expansion, as T ^ oo the (NtT)-'^/'^ -scaled error 
vanishes in probability (by assumption) and Slutsky's lemma yields 



(4.23) VTN^{qNT,T-q)'^^^KiO,a^sF~^), lP2-a.s. 

This completes the proof. □ 

Theorem 4.5 (Gram component CLT). Let p he as in (4-4) o,nd let 
G(T,N) denote the estimator of the Gram component of [p], Gram([/9]), 
based on N independent profiles {In{xt)}f=i, n = 1, . . . , N . Then, there exists 
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T^r I oo such that the normalized difference G{ti\i,N) — Gram([p]) is asymp- 
totically distributed according to the matrix Gaussian distribution 

(4.24) Vn{G{tn,N) - Gram([p])) ^Kxic(0,S), 
where the covariance matrix T, is given by 

(4.25) S = (y^ V^) Cov[vec{r}](y ® V). 

Here, ^ stands for the Kronecker product, vec is the column stacking oper- 
ator, V is any 3 x X matrix satisfying V~^V = Gram([p]) and T is a 3 x 3 
random matrix with the following second-order properties: 

var(rii) = i var(rij) = ^ /or i / j, 

(4.26) 

cav{Tii,Tjj) = -jg fori^j 
and with uncorrelated off-diagonal elements. 

We give here the definition of the matrix Gaussian distribution (also see 
Chapter 2 of Nagar and Gupta [37]). 

Definition 4.1. Let M be an p x ?i real matrix, and let S and <I> be 
positive definite pxp and n x n matrices, respectively. A real random matrix 
X is said to have the matrix Gaussian distribution ^pxniM,Ti $) if 

vec(X^) ~^„(vec(M^),S® $). 

Proof of Theorem 4.5. Let Fn,t denote the distribution of ^/NG{T,N) 
and H denote the distribution 9iKxK{Gram{[p]),T,), that is, the "shifted" 
stipulated limiting distribution with expectation Gram ([/)]) instead of zero. If 
G{N) is as in the proof of Theorem 4.3, we denote hy Qn the distribution of 
VnG{N). It will suffice to show that, for some tn T co, dpr{Fr^^N,H) "-l^ 
0, where dpr denotes the Prokhorov metric, metrizing weak convergence 
(e.g., Billingsley [4]). Applying the triangle inequality, 

(4.27) dpr{Fr^,N,H) < dp,{Fr^^N,QN) + dpr{QN,H). 

Now let dp{X,Y) = mi{6 >0:F[d{X,Y) > 6] < 6} be the standard metric 
that metrizes convergence in probability, for d the metric on the range of 
the random variables. Now (ipr(Ai,A2) is the infimum of dp{X,Y) over all 
pairs {X,Y) of random variables with (Ai, A2) the respective distributions, 
provided that d induces a separable space (Dudley [13]). It follows that 
dpv{Frj^^N,QN) will converge to zero if we can show that 

(4.28) VNAiTN,N) = VNE\\GiTN,N) - GiN)f^ ""-^ 0. 
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For this to hold, it must be that A{tn,N) is o{N^'^^^°'^/'^) for some a > 0. 
Since HmT-»oo A{T, N) = for all € N, we can always choose such a Tjy. 

We have established that the first term in the right-hand side of (4.27) 
converges to zero. For the second term, we use the classical central limit 
theorem. The sequence {25„} is i.i.d. with mean Gram([p]) and covariance S 
(see (4.19) in Panaretos [42], with n = 2). Therefore, by the multidimensional 

central limit theorem, one 

In Theorems 4.3 and 4.5, the order of magnitude of T is made dependent 
on that of N. We discuss this briefly, starting from the setup in Theorem 4.3. 
When performing the deconvolution, we ask that the mixing proportions are 
the same for all profiles. As a result, the dimension of the unknown parameter 
grows with A^, so that we should make T depend on N. If the deconvolutions 
are performed independently for each profile, 

K 2 

k=l 

(4.29) 

n = l,2,...,iV, 

then exchangeability implies a "stronger" consistency result 
Ve > 3To,No:E\\G{T,N) - Gram([p])||^ < e 

(4.30) 

ViV> iVo and r> To. 

A compromise between the two extreme approaches of overall and separate 
sums of squares is to assign profiles into groups of size less than a fixed num- 
ber, not depending on N. In the case of overall optimization, more projection 
data may require better resolution T. In biological practice, of course, the 
instruments will give a certain — hopefully high — resolution. This depends 
on the current state of technology and can be thought of as being inflexible. 
On the other hand, the number of projections can become arbitrarily large, 
the only constraint being computing time (Glaeser [18]). 

Theorem 4.5 says that for T converging to infinity sufficiently fast as 
compared to N, we are in a consistent regime, and a classical central limit 
theorem applies for the estimated shape. A slow growth of T, however, 
may provide asymptotically biased estimators. If deconvolution is carried 
out separately for each profile, a standard ^/n-type central limit theorem 
for the maximum likelihood estimates of the projected location parameters 
applies for the MLE deconvolution within a single profile. A delta method 
argument subsequently implies that picking r^v of the order of N'^ for any 
K > 1 is sufficient in order to guarantee weak convergence to the stipulated 
limit. 



2vr ^ 



arg min — ^ 

((ni"h^ .,.lu("H^^ .^-^4=1 
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4.2. Exact deconvolution and a theorem of Caratheodory. Our analysis 
has underlined the difficulties posed by the unobservability of the radial mix- 
ture expansion parameters. Intuitively, there is no transform! It is natural 
to ask whether there are special cases where these expansions need not be 
estimated, but are available as a determinate aspect of the data at hand. 

Recall that the (deterministic) problem of deconvolution is the solution 
of a Fredholm integral equation of the first kind of the form J^^ g{t — 
y)h[y)dy = f{t), for g, when h and / are known. The singular value de- 
composition of the operator (j) g * (j) (e.g., Kanwal [30]) may be used to 
solve such an equation when the functions involved are square integrable. 
In the present case, however, the function g{x) = J2k=i'^k^{x — fik) is a 
weighted Dirac comb, which is not an element of L^. 

Let f{x) be a profile. In the absence of noise, our deconvolution prob- 
lem is described as f{x) = (j){x) * J2f=i'^k^ix — iJ-k), where / and (f) are 
known, and we wish to recover {/ifc}^^^ and {ak}j^^i- In practice, we ob- 
serve a discretely sampled profile {ft}Jl'^x/2^ f^- ~ /(^) ^ even (say). 
Therefore, if we apply the inverse discrete Fourier transform we obtain 
d~f^{K-)/<P{i^) ~ J2k=i Q-k^^^'''^ for T large, assuming that 0(k) does not vanish 

for K G {27rt/r}^^_yy2- This translates our deconvolution problem into one 
of frequency identification, allowing us to invoke the following theorem. 

Theorem 4.6 (Caratheodory [7]). Let {cfc}^~Q he complex constants, 
at least one of which is nonzero (n > 1). Then, there exist an integer m, 
1 < m < n, real numbers f3j > and distinct frequencies ujj G (— 7r,7r] (j = 
1, . . . ,m) such that the Ck can be uniquely represented as 

m 

(4.31) Ck = Y.I3je'^^\ fc = 0,...,n-l. 

Setting w{k) = dJ^{K)/(f>{K), we have that w{—k) = w{k), and so we are 

in the setting w{n) = X^aLi '^fc^*'^*'^ ^'^d want to recover the a^'s and ^fc's. 
Caratheodory's theorem assures us that, for T large and for appropriate 
densities (p, there exists a unique solution to our deconvolution problem with 
probability 1 (owing to our Haar measure assumption). Pisarenko [45] hinged 
on a constructive proof of Caratheodory's result due to Grenander and Szego 
[21] to determine the hidden frequencies {ujj}: 

1. Build the matrix C = {ci-j}fj^i. 

2. Find the eigenvector v = {vq, . . . , Vm) corresponding to the smallest eigen- 
value (assuming no multiplicity) of the Toeplitz matrix Cm, the top-left 
submatrix of C of dimension m x m. 

3. Find the K distinct unit roots {e^'^^ IJLi of En=o^^n^" = 0- 
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As with any deconvolution problem, this method will be sensitive to the 
presence of noise. Li and Speed [35], in studying electrophoresis experiments, 
report that the method is fairly robust to the presence of noise and can be 
used to provide starting values for a maximum likelihood procedure. Such 
an approach is unlikely to be succesful in our setting, as, for certain profiles, 
two (at least) spikes may fall within a critical distance e of each other, 
rendering the method very sensitive to noise unless T is overwhelmingly 
large. The dependence of deconvolution on the relationship between T and 
K is connected with the Rayleigh limit and the problem of superresolution 
(see Donoho et al. [11]). Indeed, we have a nearly black object to recover; 
the £o-iiorm of the unknown signal (spike train) is much smaller than T. 

The most important aspect in our case, though, is that the information 
across different profiles can be used to gain insights about the location of 
the spikes within each particular profile, so that spikes that lie close to one 
another could potentially be identified. In the noiseless case, Caratheodory's 
theorem guarantees that we will still recover an expansion, namely that 
which combines the almost coincident spikes into a single spike. 

An extension of the method of Pisarenko to higher dimensions is not 
straightforward. The one-dimensional result can employed, however, in a 
coordinate- wise fashion (on the one-dimensional marginals of every projec- 
tion). 

5. Two examples. The dimension of the search space and the form of 
the objective function (4.6) render the practical solution of the optimization 
problem challenging in its own right. We will not pursue it here, as it is 
the subject of a separate investigation. However, in order to illustrate both 
the problem and the application of the hybrid estimator, we consider two 
mixture data sets, that are purposely chosen to be sparse and noise free, so 
that Pisarenko's method may be applied. 

5.1. A two-dimensional Gaussian mixture. Assume that we observe a 
finite sample of = 150 profiles from the stochastic Radon transform of the 
function 



with a = 0.3 and fii = (0.6,0)"^, /X2 = (0.6,0.8)"^, ^3 = (-0.1,0.1)"^, ^4 = 
(—1,-0.3)^, /is = (—0.2,-0.6)''^. The choice of location parameters and 
standard deviation was made so as to ensure a certain sparsity. Figure 4(a) 
gives a contour and intensity plot of the mixture density. 




(5.1) 
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(a) (b) 

Fig. 4. (a) Contour plot of the density p superimposed on its intensity plot, with dots 
indicating the locations of the means, (b) Superimposition of the contour plots of the true 
(red) and estimated (blue) densities. 

The 150 profiles are digitized on a grid of T = 256 regular lattice points. 
A sample of six profiles from this stochastic Radon transform is presented 
in Figure 5. The projection angles are uknown. 

The deconvolution was performed separately for each profile, using only 
Pisarenko's method, which yielded extremely accurate results, and then the 




Si^ifnrl SuftHrt suppott 



Fig. 5. Six sample profiles from the 150 profiles of the realisation of the stochastic Radon 
transform of the density given in (5.1). 
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1 



(a) 



(b) 



Fig. 6. Superimposed bootstrap replicates as a means of assessment of uncertainty. Pan- 
els (a) and (b) contain the superimposition of 15 and 100 replications, respectively. 

hybrid estimator was constructed. Figure 4(b) contains the contours of the 
estimated density superimposed on those of the true density. 

A challenging problem is uncertainty estimation and presentation. Moti- 
vated by Brillinger et al. [6] and Brillinger, Downing and Glaeser [5], who 
exploited symmetries to assess variability, we employ a bootstrap approach. 
We resample the 150 profiles and construct bootstrap replicates of the esti- 
mated density. We then superimpose the contour plots (see Figure 6). The 
rule of thumb is that the more tangled the contours appear the more un- 
certainty is associated with that particular region. The important aspect of 
these figures is that the overall shape is seen to be preserved and not to be 
highly variable. 

5.2. A three-dimensional Gaussian mixture. Next, we consider a sparse 
Gaussian mixture in three dimensions. We observe = 150 profiles from 
the stochastic Radon transform of the Gaussian mixture 



with a = 0.46, qi = 2, q2 = 3, qs = 2.4, (?4 = 4 and {/Uj} given as fii = 



(0,0.8,-0.3)^, /i2 = (0.7,-0.4,-0.3)^, /Ug = (-0.7, -0.4, -0.3)^ , = 



(0,0,0.8) . Visualization of this synthetic particle is challenging since we 
must visualize level surfaces rather than contours, via an isosurface plot [Fig- 
ure 8(a)]. Again, we notice that the mixture is sparse in the sense previously 
described. A sample of four profiles from the realization of the stochastic 




(5.2) 
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Fig. 7. Intensity plots for a sample of four profiles from the stochastic Radon transform 
of p. These images would correspond to the data yielded by the electron microscope. 

Radon transform is depicted in Figure 7. The corresponding Euler projec- 
tion angles are unknown. 

Deconvolution was carried out separately for each image, and the mixing 
coefficients were estimated as before. In particular, we used a naive de- 
convolution approach, applying Pisarenko's method to the one-dimensional 
marginals of each projection (which worked well here due to sparsity). A 
visual comparison of the estimated and the true density is given in Figure 
8, where isosurface plots are given for both densities. 

6. Concluding remarks. We formulated and investigated a problem of 
statistical tomography where the projection angles are random and unob- 
servable. The problem was seen to be ill-posed since unobservability of the 
projection angles limits us to consider inference modulo an appropriate or- 
thogonal group. For essentially bounded and compactly supported densities, 
these invariants were seen to be identifiable and the problem of their recovery 
was phrased as an estimation problem. The abstraction involved in modu- 




(b) 



Fig. 8. (a) Isosurface plots from three different perspective for the density p. (b) Isosur- 
face plots from the same three perspecitves of the estimated density. 
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lar inference may render the pursuit of a solution in the most general case 
overly ambitious. Indeed it is initially not at all clear how an estimator can 
be constructed. Nevertheless, it was seen that once the problem is "over- 
regularized," so as to become an essentially parametric problem, one may 
develop an appropriate framework for modular inference, and obtain consis- 
tent estimators. In particular, assuming that the unknown function admits 
a radial mixture representation (or, more generally, a radial basis function 
representation), consistent estimation may be performed hinging on ideas 
from D. G. Kendall's Euclidean shape theory, without making any attempt 
to estimate the unknown projection angles. In this setup, a hybrid estimator 
was constructed whose determination requires a deconvolution step and an 
inversion step, the former being the more challenging one. 

Tomography with unknown projection angles arises in singe particle elec- 
tron microscopy, and biophysicists typically proceed by estimating the un- 
known angles by means of a prior model (called a low resolution model). 
This low resolution model initializes an iterative procedure that estimates 
the angles, updates the estimate of the particle and cycles until convergence. 
Low resolution models often originate from an ad hoc "naked eye examina- 
tion" of the projections by the experienced scientist, who uses his visual 
intuition to circumvent the lack of angular information. In fact, many such 
models comprise an ensemble of solid balls in space, and can be quite suc- 
cessful as starting models, provided that the particle has enough symmetries 
to enable this ad hoc "naked eye" model construction. The results in this 
paper suggest that, potentially, it could be practically feasible to construct 
an "objective" prior model based solely on the data at hand, neither requir- 
ing symmetries, nor attempting to estimate the unknown angles. From one 
point of view, the approach described can be thought of as a mathematical 
formalization of the biophysicist's visual intuition. Indeed, the radial expan- 
sion setup investigated can provide a fruitful framework for the construction 
of initial models (R. M. Glaeser, personal communication) and its applica- 
tion to the single-particle setup is the subject of ongoing work. On a more 
theoretical level, the identifiability results presented establish the feasibility 
of reconstruction from single particle data (up to a coordinate system). 

Finally, we add a few comments on the estimation framework. Through- 
out the paper, the number of mixing components K of the unknown density 
has been assumed to be known — for example, the scientist will have some 
insight in its choice. Nevertheless, K could also be treated as an unknown 
parameter to be estimated, this being a standard problem in mixture esti- 
mation (see James, Priebe and Marchette [24], and references therein). In 
fact, more unknown parameters can be introduced, as long as the mixture 
remains identifiable. Investigation of how an EM approach could be adapted 
for this simultaneous deconvolution problem would be of interest in this case. 
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Another assumption that was made was regarding the isotropy of the Eu- 
lerian projection angles. In the anisotropic case, one can hardly proceed at 
all when the projection angles are unobservable; any re-weighting scheme 
would be ill-defined. The determination of an algorithm that would perform 
the simultaneous deconvolution step in a real setting is a problem that is of 
interest in its own right. Connections with discrete sparse inverse problems, 
such as those studied in Donoho et al. [11], are especially relevant here. 
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